Fuzzy-Semantic Similarity for Automatic Multilingual Plagiarism Detection
نویسندگان
چکیده
A word may have multiple meanings or senses, it could be modeled by considering that words in a sentence have a fuzzy set that contains words with similar meaning, which make detecting plagiarism a hard task especially when dealing with semantic meaning, and even harder for cross language plagiarism detection. Arabic is known by its richness, word’s constructions and meanings diversity, hence changing texts from/to Arabic is a complex task, and therefore adopting a fuzzy semantic-based approach seems to be the best solution. In this paper, we propose a detailed fuzzy semantic-based similarity model for analyzing and comparing texts in CLP cases, in accordance with the WordNet lexical database, to detect plagiarism in documents translated from/to Arabic, a preprocessing phase is essential to form operable data for the fuzzy process. The proposed method was applied to two texts (Arabic/English), taking into consideration the specificities of the Arabic language. The result shows that the proposed method can detect 85% of the plagiarism cases. Keywords—CLPD; fuzzy similarity; natural language processing; plagiarism detection; semantic similarity
منابع مشابه
English-Persian Plagiarism Detection based on a Semantic Approach
Plagiarism which is defined as “the wrongful appropriation of other writers’ or authors’ works and ideas without citing or informing them” poses a major challenge to knowledge spread publication. Plagiarism has been placed in four categories of direct, paraphrasing (rewriting), translation, and combinatory. This paper addresses translational plagiarism which is sometimes referred to as cross-li...
متن کاملAnálisis de similitud basado en grafos: Una nueva aproximación a la detección de plagio translingüe
Cross-language variant of automatic plagiarism detection tries to detect plagiarism among documents across language pairs. In recent years a few approaches are proposed that use thesauri, alignment models or statistical dictionaries to deal with the similarity across languages. We propose a new approach to the crosslanguage plagiarism detection that makes use of a multilingual semantic network ...
متن کاملFuzzy Semantic-Based String Similarity for Extrinsic Plagiarism Detection - Lab Report for PAN at CLEF 2010
This report explains our plagiarism detection method using fuzzy semantic-based string similarity approach. The algorithm was developed through four main stages. First is pre-processing which includes tokenisation, stemming and stop words removing. Second is retrieving a list of candidate documents for each suspicious document using shingling and Jaccard coefficient. Suspicious documents are th...
متن کاملCode Similarity via Natural Language Descriptions
Code similarity is a central challenge in many programming related applications, such as code search, automatic translation, and plagiarism detection. In this work, we reduce the problem of semantic relatedness between code fragments into a problem of semantic relatedness of textual descriptions. Our main idea is that we can use the relationship between code and its textual descriptions as esta...
متن کاملCross-language plagiarism detection
Cross-language plagiarism detection deals with the automatic identification and extraction of plagiarism in a multilingual setting. In this setting, a suspicious document is given, and the task is to retrieve all sections from the document that originate from a large, multilingual document collection. Our contributions in this field are as follows: (i) a comprehensive retrieval process for cros...
متن کامل